row taxi data benchmark
Crushing the Billion Row Taxi Data Benchmark
In the dataworld, there is a particular dataset, referred to as "the taxi dataset," that has been getting a disproportionate amount of attention lately. The dataset in question is comprised of staggering detail (full GPS, transaction type, passenger counts, timestamps) on 1.2 billion individual taxi, limo Uber trips from January 2009 through June 2015. Released by the New York City Taxi & Limousine Commission, the dataset became a darling of the data science set while also emerging as a popular test of database query speed. One of the leaders on the database performance benchmarking side is Mark Litwintschik, a consultant, blogger and database fanatic from the UK. Mark has tested more than 14 different databases/configurations using the dataset since it was first released in late 2015.
- Transportation > Passenger (1.00)
- Transportation > Ground > Road (0.93)